作为一个严重的问题,近年来已经广泛研究了单图超分辨率(SISR)。 SISR的主要任务是恢复由退化程序引起的信息损失。根据Nyquist抽样理论,降解会导致混叠效应,并使低分辨率(LR)图像的正确纹理很难恢复。实际上,自然图像中相邻斑块之间存在相关性和自相似性。本文考虑了自相似性,并提出了一个分层图像超分辨率网络(HSRNET)来抑制混叠的影响。我们从优化的角度考虑SISR问题,并根据半季节分裂(HQS)方法提出了迭代解决方案模式。为了先验探索本地图像的质地,我们设计了一个分层探索块(HEB)并进行性增加了接受场。此外,设计多级空间注意力(MSA)是为了获得相邻特征的关系并增强了高频信息,这是视觉体验的关键作用。实验结果表明,与其他作品相比,HSRNET实现了更好的定量和视觉性能,并更有效地释放了别名。
translated by 谷歌翻译
将低分辨率(LR)图像恢复到超分辨率(SR)图像具有正确和清晰的细节是挑战。现有的深度学习工作几乎忽略了图像的固有结构信息,这是对SR结果的视觉感知的重要作用。在本文中,我们将分层特征开发网络设计为探测并以多尺度特征融合方式保持结构信息。首先,我们提出了在传统边缘探测器上的交叉卷积,以定位和代表边缘特征。然后,交叉卷积块(CCBS)设计有功能归一化和渠道注意,以考虑特征的固有相关性。最后,我们利用多尺度特征融合组(MFFG)来嵌入交叉卷积块,并在层次的层次上开发不同尺度的结构特征的关系,调用名为Cross-SRN的轻量级结构保护网络。实验结果表明,交叉SRN通过准确且清晰的结构细节实现了对最先进的方法的竞争或卓越的恢复性能。此外,我们设置了一个标准,以选择具有丰富的结构纹理的图像。所提出的跨SRN优于所选择的基准测试的最先进的方法,这表明我们的网络在保存边缘具有显着的优势。
translated by 谷歌翻译
打开世界对象检测(OWOD),模拟知识持续增长的真正动态世界,试图检测已知和未知的类别,并逐步学习所识别的未知组。我们发现,尽管以前的欧瓦德工作建设性地提出了OWOD定义,但实验设置与不合逻辑的基准,令人困惑的度量计算和不当方法是不合理的。在本文中,我们重新思考OWOD实验环境,并提出了五项基本基准原则,以指导OWOD基准建设。此外,我们设计了两个特定于OWOD问题的公平评估协议,从未知课程的角度填充了评估的空白。此外,我们介绍了一个新颖且有效的OWOD框架,其中包含辅助提案顾问(PAD)和特定于类驱逐分类器(CEC)。非参数垫可以帮助RPN识别无需监控的准确未知提案,而CEC通过特定于类的驱逐函数校准过自信的激活边界并滤除令人困惑的预测。在我们的公平基准上进行的综合实验表明,我们的方法在现有的和我们的新指标方面表明了其他最先进的对象检测方法。\脚注{我们的基准和代码可在https://github.com提供/重新驱动/重新驱动。
translated by 谷歌翻译
为了更好地了解深度神经网络的结构效益和泛化能力,我们首先提出了一种新颖的神经网络模型的理论制定,包括完全连接的残余网络(Reset)和密集连接的网络(Densenet)。其次,我们将两层网络\ CITE {EW2019PRIORITWO}和RESET \ CITE {E2019PRIORIRES}的误差分析扩展到DENSENET,并进一步显示满足某些温和条件的神经网络,可以获得类似的估计。这些估计本质上是先验的,因为它们依赖于在训练过程之前的信息上依赖于信息,特别是估计误差的界限与输入维度无关。
translated by 谷歌翻译
单像超分辨率(SISR),作为传统的不良反对问题,通过最近的卷积神经网络(CNN)的发展得到了极大的振兴。这些基于CNN的方法通常将低分辨率图像映射到其相应的高分辨率版本,具有复杂的网络结构和损耗功能,显示出令人印象深刻的性能。本文对传统的SISR算法提供了新的洞察力,并提出了一种基本上不同的方法,依赖于迭代优化。提出了一种新颖的迭代超分辨率网络(ISRN),顶部是迭代优化。我们首先分析图像SR问题的观察模型,通过以更一般和有效的方式模仿和融合每次迭代来激发可行的解决方案。考虑到批量归一化的缺点,我们提出了一种特征归一化(F-NOM,FN)方法来调节网络中的功能。此外,开发了一种具有FN的新颖块以改善作为FNB称为FNB的网络表示。剩余剩余结构被提出形成一个非常深的网络,其中FNBS与长时间跳过连接,以获得更好的信息传递和稳定训练阶段。对BICUBIC(BI)降解的测试基准的广泛实验结果表明我们的ISRN不仅可以恢复更多的结构信息,而且还可以获得竞争或更好的PSNR / SSIM结果,与其他作品相比,参数更少。除BI之外,我们除了模拟模糊(BD)和低级噪声(DN)的实际降级。 ISRN及其延伸ISRN +两者都比使用BD和DN降级模型的其他产品更好。
translated by 谷歌翻译
抽象推理是指分析信息,以无形层面发现规则以及以创新方式解决问题的能力。 Raven的渐进式矩阵(RPM)测试通常用于检查抽象推理的能力。要求受试者从答案集中确定正确的选择,以填充RPM右下角(例如,3 $ \ times $ 3矩阵),按照矩阵内的基本规则。最近利用卷积神经网络(CNN)的研究取得了令人鼓舞的进步,以实现RPM测试。但是,它们部分忽略了RPM求解器的必要归纳偏置,例如每个行/列内的订单灵敏度和增量规则诱导。为了解决这个问题,在本文中,我们提出了一个分层的规则感知网络(SRAN),以生成两个输入序列的规则嵌入。我们的SRAN学习了不同级别的多个粒度规则嵌入,并通过封闭的融合模块逐步整合了分层的嵌入流。借助嵌入,应用规则相似性度量标准来确保SRAN不仅可以使用Tuplet损失对SRAN进行训练,还可以有效地推断出最佳答案。我们进一步指出,用于RPM测试的流行Raven数据集中存在的严重缺陷,这阻止了对抽象推理能力的公平评估。为了修复缺陷,我们提出了一种称为属性分配树(ABT)的答案集合生成算法,形成了一个改进的数据集(简称I-Raven)。在PGM和I-Raven数据集上进行了广泛的实验,这表明我们的Sran的表现优于最先进的模型。
translated by 谷歌翻译
The ''Propose-Test-Release'' (PTR) framework is a classic recipe for designing differentially private (DP) algorithms that are data-adaptive, i.e. those that add less noise when the input dataset is nice. We extend PTR to a more general setting by privately testing data-dependent privacy losses rather than local sensitivity, hence making it applicable beyond the standard noise-adding mechanisms, e.g. to queries with unbounded or undefined sensitivity. We demonstrate the versatility of generalized PTR using private linear regression as a case study. Additionally, we apply our algorithm to solve an open problem from ''Private Aggregation of Teacher Ensembles (PATE)'' -- privately releasing the entire model with a delicate data-dependent analysis.
translated by 谷歌翻译
Image token removal is an efficient augmentation strategy for reducing the cost of computing image features. However, this efficient augmentation strategy has been found to adversely affect the accuracy of CLIP-based training. We hypothesize that removing a large portion of image tokens may improperly discard the semantic content associated with a given text description, thus constituting an incorrect pairing target in CLIP training. To address this issue, we propose an attentive token removal approach for CLIP training, which retains tokens with a high semantic correlation to the text description. The correlation scores are computed in an online fashion using the EMA version of the visual encoder. Our experiments show that the proposed attentive masking approach performs better than the previous method of random token removal for CLIP training. The approach also makes it efficient to apply multiple augmentation views to the image, as well as introducing instance contrastive learning tasks between these views into the CLIP framework. Compared to other CLIP improvements that combine different pre-training targets such as SLIP and MaskCLIP, our method is not only more effective, but also much more efficient. Specifically, using ViT-B and YFCC-15M dataset, our approach achieves $43.9\%$ top-1 accuracy on ImageNet-1K zero-shot classification, as well as $62.7/42.1$ and $38.0/23.2$ I2T/T2I retrieval accuracy on Flickr30K and MS COCO, which are $+1.1\%$, $+5.5/+0.9$, and $+4.4/+1.3$ higher than the SLIP method, while being $2.30\times$ faster. An efficient version of our approach running $1.16\times$ faster than the plain CLIP model achieves significant gains of $+5.3\%$, $+11.3/+8.0$, and $+9.5/+4.9$ on these benchmarks.
translated by 谷歌翻译
Full-body reconstruction is a fundamental but challenging task. Owing to the lack of annotated data, the performances of existing methods are largely limited. In this paper, we propose a novel method named Full-body Reconstruction from Part Experts~(FuRPE) to tackle this issue. In FuRPE, the network is trained using pseudo labels and features generated from part-experts. An simple yet effective pseudo ground-truth selection scheme is proposed to extract high-quality pseudo labels. In this way, a large-scale of existing human body reconstruction datasets can be leveraged and contribute to the model training. In addition, an exponential moving average training strategy is introduced to train the network in a self-supervised manner, further boosting the performance of the model. Extensive experiments on several widely used datasets demonstrate the effectiveness of our method over the baseline. Our method achieves the state-of-the-art performance. Code will be publicly available for further research.
translated by 谷歌翻译
In recent years, applying deep learning (DL) to assess structural damages has gained growing popularity in vision-based structural health monitoring (SHM). However, both data deficiency and class-imbalance hinder the wide adoption of DL in practical applications of SHM. Common mitigation strategies include transfer learning, over-sampling, and under-sampling, yet these ad-hoc methods only provide limited performance boost that varies from one case to another. In this work, we introduce one variant of the Generative Adversarial Network (GAN), named the balanced semi-supervised GAN (BSS-GAN). It adopts the semi-supervised learning concept and applies balanced-batch sampling in training to resolve low-data and imbalanced-class problems. A series of computer experiments on concrete cracking and spalling classification were conducted under the low-data imbalanced-class regime with limited computing power. The results show that the BSS-GAN is able to achieve better damage detection in terms of recall and $F_\beta$ score than other conventional methods, indicating its state-of-the-art performance.
translated by 谷歌翻译